Web-Site Boundary Detection Using Incremental RandomWalk Clustering

نویسندگان

  • Ayesh Alshukri
  • Frans Coenen
  • Michele Zito
چکیده

In this paper we describe a random walk clustering technique to address the Website Boundary Detection (WBD) problem. The technique is fully described and compared with alternative (breadth and depth first) approaches. The reported evaluation demonstrates that the random walk technique produces comparable or better results than those produced by these alternative techniques, while at the same time visiting fewer ‘noise’ pages. To demonstrate that the good results are not simply a consequence of a randomisation of the input data we also compare with a random ordering technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Web-Site Boundary Detection Using Random Walks

The paper describes variations of the classical k-means clustering algorithm that can be used effectively to address the so called Web-site Boundary Detection (WBD) problem. The suggested advantages offered by these techniques are that they can quickly identify most of the pages belonging to a web-site; and, in the long run, return a solution of comparable (if not better) accuracy than other cl...

متن کامل

Web-Site Boundary Detection

Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to boundary detection is suggested. The principal issue is feature selection, hampered by the observation that there is no clear understanding of what a web-site is. This paper proposes a definition of a web-site, founde...

متن کامل

A Hybrid Framework for Building an Efficient Incremental Intrusion Detection System

In this paper, a boosting-based incremental hybrid intrusion detection system is introduced. This system combines incremental misuse detection and incremental anomaly detection. We use boosting ensemble of weak classifiers to implement misuse intrusion detection system. It can identify new classes types of intrusions that do not exist in the training dataset for incremental misuse detection. As...

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

Clustering Web Content for Efficient Replication

Recently there has been an increasing deployment of content distribution networks (CDNs) that offer hosting services to Web content providers. In this paper, we first compare the un-cooperative pulling of Web contents used by commercial CDNs with the cooperative pushing. Our results show that the latter can achieve comparable users’ perceived performance with only 4 5% of replication and update...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011